<<<<<<< HEAD Lecture 11 - Multiple Regression – Bill’s Biostats Website ======= <<<<<<<< HEAD:docs/lectures/lecture_13/13_01_lecture_powerpoint_html.html Lecture 13 - Multifactor ANOVA – Bill’s Biostats Website ======== Lecture 11 - Multiple Regression – Bill’s Biostats Website >>>>>>>> origin/main:docs/lectures/lecture_11/11_01_lecture_powerpoint_html.html >>>>>>> origin/main
======= <<<<<<<< HEAD:docs/lectures/lecture_13/13_01_lecture_powerpoint_html.html

Other Formats

========

Other Formats

>>>>>>>> origin/main:docs/lectures/lecture_11/11_01_lecture_powerpoint_html.html >>>>>>> origin/main
<<<<<<< HEAD

Lecture 11 - Multiple Regression

======= <<<<<<<< HEAD:docs/lectures/lecture_13/13_01_lecture_powerpoint_html.html

Lecture 13 - Multifactor ANOVA

========

Lecture 11 - Multiple Regression

>>>>>>>> origin/main:docs/lectures/lecture_11/11_01_lecture_powerpoint_html.html >>>>>>> origin/main
Author

Bill Perry

<<<<<<< HEAD ======= <<<<<<<< HEAD:docs/lectures/lecture_13/13_01_lecture_powerpoint_html.html

Lecture 12: Review

ANOVA

  • Analysis of variance: single and multi-factor designs
  • Examples: diatoms, circadian rhythms
  • Predictor variables: fixed vs. random
  • ANOVA model
  • Analysis and partitioning of variance
  • Null hypothesis
  • Assumptions and diagnostics
  • Post F Tests - Tukey and others
  • Reporting the results

Lecture 13: Multifactor ANOVA Overview

Lecture 13: 2 Factor or 2 Way ANOVA

Often consider more than 1 factor (independent categorical variable):

  • reduce unexplained variance
  • look at interactions

2-factor designs (2-way ANOVA) very common in ecology

  • Can have more factors (e.g., 3-way ANOVA)
    • interpretation tricky…

Most multifactor designs: nested or factorial

<<<<<<<< HEAD:docs/lectures/lecture_13/13_01_lecture_powerpoint_html.html

Lecture 13: Nested and factorial designs

Consider two factors: A and B

  • Nested/hierarchical: levels of B occur only in 1 level of A
  • Factorial/crossed: every level of B in every level of A

Lecture 13: Nested and factorial designs

Nested Designs:

  • Factor A usually fixed
  • Factor B usually random

Lecture 13: Nested and factorial designs

Factorial Designs:

  • Both factors typically fixed (but not always)

Lecture 13: Nested designs: examples

Study on effects of enclosure size on limpet growth:

  • 2 enclosure sizes (factor A)
  • 5 replicate enclosures (factor B)
  • 5 replicate limpets per enclosure

Lecture 13: Nested designs: examples

Study on reef fish recruitment: 5 sites (factor A) 6 transects at each site (factor B) replicate observations along each transect

Lecture 13: Nested designs: examples

Effects of sea urchin grazing on biomass of filamentous algae:

  • 4 levels of urchin grazing: none, L, M, H
  • 4 patches of rocky bottom (3-4 m2) nested in each level of grazing
  • 5 replicate quadrats per patch

F

Lecture 13: Factorial designs: examples

Effects of light level on growth of seedlings of different size:

  • 3 light levels (factor A)
  • 3 size classes (factor B)
  • 5 replicate seeding in each cell

Lecture 13: Factorial designs: examples

Effects of food level and tadpole presence on larval salamander growth

  • 2 food levels (factor A)
  • presence/absence of tadpoles (factor B)
  • 8 replicates in each cell

Lecture 13: Factorial designs: examples

Effect of season and density on limpet fecundity.

  • 2 seasons (factor A)
  • 4 density treatments (factor B)
  • 3 replicates in each cell

F

Lecture 13: Nested designs: linear model

Consider a nested design with:

  • p levels of factor A (i= 1…p) (e.g., 4 grazing levels)
  • q levels of factor B (j= 1…q), nested within each level of A (e.g., 4 - diff. patches per grazing level)
  • n replicates (k= 1…n) in each combination of A and B (5 replicate - quadrats in each patch in each grazing level)

I

Lecture 13: Nested designs: linear model

Can calculate several means:

  • overall mean (across all levels of A and B)= ȳ;
  • a mean for each level of A (across all levels of B in that A)= ȳi;
  • a mean for each level of B within each A= ȳj(i)

Lecture 13: Nested designs: linear model

Lecture 13: Nested designs: linear model

The linear model for a nested design is: \[y_{ijk} = \mu + \alpha_i + \beta_{j(i)} + \varepsilon_{ijk}\]

Where:

  • \(y_{ijk}\) is the response variable

    • value of the k-th replicate in j-th level of B in the i-th level of A

    • (algal biomass in 3rd quadrat, in 2nd patch in low grazing treatment)

  • \(\mu\) is the overall mean

    • (overall average algal biomass)

Lecture 13: Nested designs: linear model

The linear model for a nested design is:

The linear model for a nested design is: \[y_{ijk} = \mu + \alpha_i + \beta_{j(i)} + \varepsilon_{ijk}\]

  • \(\alpha_i\) is the fixed effect of factor \(i\)

  • (difference between average biomass in all low grazing level quadrats and overall mean)

  • \(\beta_{j(i)}\) is the random effect of factor \(j\) nested within factor \(i\)

  • usually random variable, measuring variance among all possible levels of B within each level of A

  • (variance among all possible patches that may have been used in the low grazing treatment)

Lecture 13: Nested designs: linear model

The linear model for a nested design is:

The linear model for a nested design is: \[y_{ijk} = \mu + \alpha_i + \beta_{j(i)} + \varepsilon_{ijk}\]

  • \(\varepsilon_{ijk}\) is the error term
  • αi: is the effect of the ith level of A: µi- µ
  • unexplained variance associated with the kth replicate in jth level of B in the ith level of A
  • (difference bw observed algal biomass in 3rd quadrat in 2nd patch in low grazing treatment and predicted biomass - average biomass in 2nd patch in low grazing treatment)

Lecture 13: Nested designs: analysis of variance

As before, partition the variance in the response variable using SS SSA is SS of differences between means in each level of A and overall mean

Lecture 13: Multifactor ANOVA

SSB is SS of difference between means in each level of B and the mean of corresponding level of A summed across levels of A

Lecture 13: Nested designs: analysis of variance

  • SSresid is difference bw each observation and mean for its level of factor B, summed over all observations
  • SStotal = SSA + SSB + SSresid
  • SS can be turned into MS by dividing by appropriate df

Lecture 13: Nested designs: analysis of variance

Lecture 13: Nested designs: null hypotheses

Two hypotheses tested on values of MS:

  1. no effects of factor A
  • Assuming A is fixed:
  • Ho(A): µ1= µ2= µ3=…. µi= µ
  • Same as in 1-factor ANOVA, using means from B factors nested within each - level of A
  • (no difference in algal biomass across all levels of grazing: µnone= - µlow= µmed= µhigh)

Lecture 13: Nested designs: null hypotheses

Two hypotheses tested on values of MS:

  1. No effects of factor B nested in A
  • Assuming B is random:
  • Ho(B): σβ2= 0 (no variance added due to differences between all possible - levels of B)
  • (no variance added due to differences between patches)

Lecture 13: Nested designs: null hypotheses

Conclusions?

“significant variation between replicate patches within each treatment, but no significant difference in amount of filamentous algae between treatments”

Lecture 13: Nested designs: unbalanced designs

Unequal sample sizes can be because of:

  • uneven number of B levels within each A
  • uneven number of replicates within each level of B

Not a problem, unless have unequal variance or large deviation from - normality

Lecture 13: Nested designs: assumptions

As usual, we assume

Equal variance + normality need to be assessed at both levels:

======== >>>>>>> origin/main
Independent variable
Dependent variable Continuous Categorical
Continuous Regression ANOVA
Categorical Logistic regression Tabular

Lecture 11: Analyses

Abundance of C3 grasses can be modeled as function of

  • latitude
  • longitude
  • both

Instead of line, modeled with (hyper)plane

Lecture 11: Analyses

Used in similar way to simple linear regression:

  • Describe nature of relationship between Y and X’s
  • Determine explained / unexplained variation in Y
  • Predict new Ys from X
  • Find the “best” model

S

Lecture 11: Analyses

Crawley 2012: “Multiple regression models provide some of the most profound challenges faced by the analyst”:

  • Overfitting
  • Parameter proliferation
  • Multicollinearity
  • Model selection

Lecture 11: Analyses

Multiple Regression:

\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]

Lecture 11: Multiple linear regression model

Multiple Regression:

\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]

Lecture 11: Regression parameters

Multiple Regression:

\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]

Lecture 11: Regression parameters

Multiple Regression:

\[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_p x_{ip} + \epsilon_i\]

Lecture 11: Regression parameters

Regression equation can be used for prediction by subbing new values for predictor (X) variables

Lecture 11: Analyses of variance

Variance - SStotal partitioned into SSregression and SSresidual

Source of variation SS df MS Interpretation
Regression \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) \(p\) \(\frac{\sum_{i=1}^{n} (y_i - \bar{y})^2}{p}\) Difference between predicted observation and mean
Residual \(\sum_{i=1}^{n} (y_i - \hat{y}_i)^2\) \(n-p-1\) \(\frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{n-p-1}\) Difference between each observation and predicted
Total \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) \(n-1\) Difference between each observation and mean

Lecture 11: Analyses

SS converted to non-additive MS (SS/df)

Source of variation SS df MS
Regression \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) \(p\) \(\frac{\sum_{i=1}^{n} (y_i - \bar{y})^2}{p}\)
Residual \(\sum_{i=1}^{n} (y_i - \hat{y}_i)^2\) \(n-p-1\) \(\frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{n-p-1}\)
Total \(\sum_{i=1}^{n} (y_i - \bar{y})^2\) \(n-1\)

Lecture 11: Hypotheses

Two Hos usually tested in MLR:

Lecture 11: Hypotheses

Also: is any specific β = 0 (explanatory role)?

Lecture 11: Hypotheses

\[F_{w,n-p} = \frac{MS_{Extra}}{FULL\ MS_{Residual}} \] Can also use t-test (R provides this value)

Lecture 11: Explained variance

Explained variance (r2) is calculated the same way as for simple regression:

\[r^2 = \frac{SS_{Regression}}{SS_{Total}} = 1 - \frac{SS_{Residual}}{SS_{Total}} \]

Lecture 11: Assumptions and diagnostics

  • Assume fixed Xs; unrealistic in most biological settings
  • No major (influential) outliers
  • Check leverage, influence- Cook’s Di

Lecture 11: Assumptions and diagnostics

  • Normality, equal variance, independence
  • Residual QQ-plots, residuals vs. predicted values plot
  • Distribution/variance often corrected by transforming Y

Lecture 11: Assumptions and diagnostics

More observations than predictor variables

Lecture 11: Analyses

Regression of Y vs. each X does not consider effect of other predictors:

want to know shape of relationship while holding other predictors constant

<<<<<<< HEAD =======

::::

>>>>>>> origin/main

Lecture 11: Collinearity

Lecture 11: Collinearity

Collinearity can be detected by:

Lecture 11: Interactions

Predictors can be modeled as:

\[y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \epsilon_i \quad \text{vs.} \quad y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + + \beta_3X_{i3} \epsilon_i\]

“Curvature” of the regression (hyper)plane

Lecture 11: Analyses

Lecture 11: Analyses

Adding interactions:

Lecture 11: Dummy variables

Multiple Linear Regression accommodates continuous and categorical variables (gender, vegetation type, etc.) Categorical vars as “dummy vars”, n of dummy variables = n-1 categories

Sex M/F:

Fertility L/M/H:

Fertility fert1 fert2
Low 0 0
Med 1 0
High 0 1

Lecture 11: Analyses

Coefficients interpreted relative to reference condition

Fertility fert1 fert2
Low 0 0
Med 1 0
High 0 1

Lecture 11: Analyses

S

Lecture 11: Comparing models

When have multiple predictors (and interactions!)

To chose:

Overfitting

Lecture 11: Comparing models

Need to account for increase in fit with added predictors:

\[\text{Adjusted } r^2 = 1 - \frac{SS_{\text{Residual}}/(n - (p + 1))}{SS_{\text{Total}}/(n - 1)}\] \[\text{Akaike Information Criterion (AIC)} = n[\ln(SS_{\text{Residual}})] + 2(p + 1) - n\ln(n)\]

Lecture 11: Comparing models

But how to compare models?

We will use manual form of backward selection

Lecture 11: Analyses

Lecture 11: Predictors

Usually want to know relative importance of predictors to explaining Y

Lecture 11: Predictors

Using F-tests (or t-tests) on partial regression slopes:

Lecture 11: Predictors

Using coefficient of partial determination:

\[r_{X_j}^2 = \frac{SS_{\text{Extra}}}{\text{Reduced }SS_{\text{Residual}}}\]

SSextra

Lecture 11: Predictors

Using standardized partial regression slopes:

Lecture 11: Predictors

Using partial r2 values:

Lecture 11: Reporting results

Results are easiest to report in tabular format

Lecture 11: Reporting results

Results are easiest to report in tabular format

<<<<<<< HEAD ======= >>>>>>>> origin/main:docs/lectures/lecture_11/11_01_lecture_powerpoint_html.html >>>>>>> origin/main
Back to top